Search CORE

10 research outputs found

Recommended from our members

Codes for Synchronization in Channels and Sources with Edits

Author: Abroshan Mahed
Publication venue: University of Cambridge
Publication date: 22/07/2019
Field of study

Edit channels are a class of communication channels where the output of the channel is an edited version of the input. The edits are considered to be deletions and insertions. DNA-based data storage system is one of the motivations for this model. This thesis studies various problems related to edit channel and also edit synchronization problem. Varshamov-Tenengolts (VT) codes are first introduced. These codes can correct a single deletion or insertion and have a linear-time decoder. The problem of efficient encoding of the non-binary version of VT codes is addressed, where a simple linear-time encoding method to systematically map binary message sequences onto VT codewords is proposed. Another model that is studied is segmented edit channels, where we have the additional assumption that the channel input sequence is implicitly divided into segments such that at most one edit can occur within a segment. A code construction is proposed for this model based on subsets of VT codes chosen with pre-determined prefxes and/or sufxes. Also an upper bound is derived on the rate of any zero-error code for the segmented edit channel in terms of the segment length. This upper bound shows that the rate scaling of the proposed codes as the segment length increases is the same as that of the maximal code. Edit synchronization is another problem studied in this thesis. In this model, there are two remote nodes (encoder and decoder), each having a binary sequence. The sequence X, available at the encoder, is the updated sequence and diﬀers from Y (available at the decoder) by a small number of edits. The goal is to construct a message M, to be sent via a one-way error-free link, such that the decoder can reconstruct X using M and Y. A coding scheme is devised for this one-way synchronization model. The scheme is based on multiple layers of VT codes combined with oﬀ-the-shelf linear error-correcting codes and uses a list decoder. Motivated by the sequence reconstruction problem from traces in DNA-based storage, the problem of designing codes for the deletion channel when multiple observations (or traces) are available to the decoder is considered. A simple binary and non-binary code is proposed that splits the codeword into blocks and employs a VT code in each block. The availability of multiple traces helps the decoder to identify deletion-free copies of a block, and to avoid mis-synchronization while decoding. The encoding complexity of the proposed scheme is linear in the codeword length; the decoding complexity is linear in the codeword length and quadratic in the number of deletions and the number of traces. The list decoding technique for the proposed code is also considered

Apollo (Cambridge)

Zero Error Coordination

Author: Abroshan Mahed
Gohari Amin
Jaggi Sidharth
Publication venue
Publication date: 01/01/2015
Field of study

In this paper, we consider a zero error coordination problem wherein the nodes of a network exchange messages to be able to perfectly coordinate their actions with the individual observations of each other. While previous works on coordination commonly assume an asymptotically vanishing error, we assume exact, zero error coordination. Furthermore, unlike previous works that employ the empirical or strong notions of coordination, we define and use a notion of set coordination. This notion of coordination bears similarities with the empirical notion of coordination. We observe that set coordination, in its special case of two nodes with a one-way communication link is equivalent with the "Hide and Seek" source coding problem of McEliece and Posner. The Hide and Seek problem has known intimate connections with graph entropy, rate distortion theory, Renyi mutual information and even error exponents. Other special cases of the set coordination problem relate to Witsenhausen's zero error rate and the distributed computation problem. These connections motivate a better understanding of set coordination, its connections with empirical coordination, and its study in more general setups. This paper takes a first step in this direction by proving new results for two node networks

arXiv.org e-Print Archive

Explore Bristol Research

Efficient Systematic Encoding of Non-binary VT Codes

Author: Abroshan Mahed
Fabregas Albert Guillen i
Venkataramanan Ramji
Publication venue
Publication date: 27/04/2018
Field of study

Varshamov-Tenengolts (VT) codes are a class of codes which can correct a single deletion or insertion with a linear-time decoder. This paper addresses the problem of efficient encoding of non-binary VT codes, defined over an alphabet of size

q >2

. We propose a simple linear-time encoding method to systematically map binary message sequences onto VT codewords. The method provides a new lower bound on the size of

q

-ary VT codes of length

n

.Comment: This paper will appear in the proceedings of ISIT 201

arXiv.org e-Print Archive

Crossref

Recommended from our members

Multilayer codes for synchronization from deletions.

Author: Abroshan Mahed
Fabregas Albert Guillen I
Venkataramanan Ramji
Publication venue: ITW
Publication date: 01/01/2017
Field of study

Apollo (Cambridge)

Improving Fairness and Privacy in Selection Problems

Author: Abroshan Mahed
Khalili Mohammad Mahdi
Sojoudi Somayeh
Zhang Xueru
Publication venue
Publication date: 07/12/2020
Field of study

Supervised learning models have been increasingly used for making decisions about individuals in applications such as hiring, lending, and college admission. These models may inherit pre-existing biases from training datasets and discriminate against protected attributes (e.g., race or gender). In addition to unfairness, privacy concerns also arise when the use of models reveals sensitive personal information. Among various privacy notions, differential privacy has become popular in recent years. In this work, we study the possibility of using a differentially private exponential mechanism as a post-processing step to improve both fairness and privacy of supervised learning models. Unlike many existing works, we consider a scenario where a supervised model is used to select a limited number of applicants as the number of available positions is limited. This assumption is well-suited for various scenarios, such as job application and college admission. We use ``equal opportunity'' as the fairness notion and show that the exponential mechanisms can make the decision-making process perfectly fair. Moreover, the experiments on real-world datasets show that the exponential mechanism can improve both privacy and fairness, with a slight decrease in accuracy compared to the model without post-processing.Comment: This paper has been accepted for publication in the 35th AAAI Conference on Artificial Intelligenc

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

An Information-theoretical Approach to Semi-supervised Learning under Covariate-shift

Author: Abroshan Mahed
Aminian Gholamali
Khalili Mohammad Mahdi
Rodrigues Miguel RD
Toni Laura
Publication venue: Journal Machine Learning Research (JMLR)
Publication date: 01/01/2022
Field of study

A common assumption in semi-supervised learning is that the labeled, unlabeled, and test data are drawn from the same distribution. However, this assumption is not satisfied in many applications. In many scenarios, the data is collected sequentially (e.g., healthcare) and the distribution of the data may change over time often exhibiting so-called covariate shifts. In this paper, we propose an approach for semi-supervised learning algorithms that is capable of addressing this issue. Our framework also recovers some popular methods, including entropy minimization and pseudo-labeling. We provide new information-theoretical based generalization error upper bounds inspired by our novel framework. Our bounds are applicable to both general semi-supervised learning and the covariate-shift scenario. Finally, we show numerically that our method outperforms previous approaches proposed for semi-supervised learning under the covariate shift.Comment: Accepted at AISTATS 202

arXiv.org e-Print Archive

UCL Discovery

Coding for Segmented Edit Channels.

Author: Abroshan Mahed
Fabregas Albert Guillen I
Venkataramanan Ramji
Publication venue: IEEE Trans. Inf. Theory
Publication date: 01/01/2018
Field of study

We consider insertion and deletion channels with the additional assumption that the channel input sequence is implicitly divided into segments such that at most one edit can occur within a segment. No segment markers are available in the received sequence. We propose code constructions for the segmented deletion, segmented insertion, and segmented insertion-deletion channels based on subsets of Varshamov-Tenengolts codes chosen with pre-determined prefixes and/or suffixes. The proposed codes, constructed for any finite alphabet, are zero-error and can be decoded segment-by-segment. We also derive an upper bound on the rate of any zero-error code for the segmented edit channel, in terms of the segment length. This upper bound shows that the rate scaling of the proposed codes as the segment length increases is the same as that of the maximal code

arXiv.org e-Print Archive

Apollo (Cambridge)

Learning machines for health and beyond

Author: Abroshan Mahed
Giles Oscar
Greenbury Sam
Roberts Jack
Steyn Jannetta S
van der Schaar Mihaela
Wilson Alan
Yong May
Publication venue
Publication date: 10/04/2023
Field of study

Machine learning techniques are effective for building predictive models because they are good at identifying patterns in large datasets. Development of a model for complex real life problems often stops at the point of publication, proof of concept or when made accessible through some mode of deployment. However, a model in the medical domain risks becoming obsolete as soon as patient demographic changes. The maintenance and monitoring of predictive models post-publication is crucial to guarantee their safe and effective long term use. As machine learning techniques are effectively trained to look for patterns in available datasets, the performance of a model for complex real life problems will not peak and remain fixed at the point of publication or even point of deployment. Rather, data changes over time, and they also changed when models are transported to new places to be used by new demography.Comment: 12 pages, 3 figure

arXiv.org e-Print Archive

Symbolic Metamodels for Interpreting Black-Boxes Using Primitive Functions

Author: Abroshan Mahed
Khalili Mohammad Mahdi
Mishra Saumitra
Publication venue: Association for the Advancement of Artificial Intelligence
Publication date: 26/06/2023
Field of study

One approach for interpreting black-box machine learning models is to find a global approximation of the model using simple interpretable functions, which is called a metamodel (a model of the model). Approximating the black-box with a metamodel can be used to 1) estimate instance-wise feature importance; 2) understand the functional form of the model; 3) analyze feature interactions. In this work, we propose a new method for finding interpretable metamodels. Our approach utilizes Kolmogorov superposition theorem, which expresses multivariate functions as a composition of univariate functions (our primitive parameterized functions). This composition can be represented in the form of a tree. Inspired by symbolic regression, we use a modified form of genetic programming to search over different tree configurations. Gradient descent (GD) is used to optimize the parameters of a given configuration. Our method is a novel memetic algorithm that uses GD not only for training numerical constants but also for the training of building blocks. Using several experiments, we show that our method outperforms recent metamodeling approaches suggested for interpreting black-boxes

Association for the Advancement of Artificial Intelligence: AAAI Publications